home *** CD-ROM | disk | FTP | other *** search
- [ This file is way out of date ]
-
- Performance:
-
- - MMX ycrcb conversion to rgb
-
- - combine multiplies across weight/quant/idct prescale (?)
-
- - parse:
- - re-arrange data so that coeff blocks are all one big array
- (alignment + one big memset (mmx!) at beginning of segment)
- - more efficient bookkeeping (vs current brute force mark and sweep)
- in second and third passes of parse
-
- - still optimize vlc:
- - combine lookup tables that use the same index:
- - first level of classes, class_index_mask, class_index_rshift
- (all indexed by maxbits)
- - vlc_lookups, vlc_index_mask, vlc_index_rshift (all indexed by class)
- - sign_mask, sign_rshift (indexed by vlc len)
-
- - think about optimizing vlc/getbits interface based on a few
- observations:
- - there are three lookups in vlc of the form ((bits & mask) >> shift) are really doing this:
- bitstream_show_skip(bs,skip,len) // show len bits, beginning skip bits from current position
- - if we add that interface, and then mmx getbits, this could free registers
- for better tuning the rest of the vlc lookup code.
- - note that start and len are bounded to the range 0-16, it might pay
- to ensure that after flush, show can always count on at least 16 bits
- remaining in bs->current_word
- - (there are multiple shows for each flush - eliminates branch in show)
- - since we parse a whole video segment before we do idcts, we can reserve
- mmx registers for getbits state for the entire duration of parsing a video segment
- - note that bitstream state is re-initialized everytime we start a new video segment
-
- - mmx version of 248 idct
-
- - tune cache footprint: access input and output withouth polluting L1
-
- - get everything working in Windows and use VTune to analyze and
- improve x86 performance.
-
- Documentation:
-
- - there is none!
-
- - the contents of this file has/will move to the project task list
- on sourceforge.
-
-
-
-
-
-
-